Predicting coal workers’ pneumoconiosis trends: Leveraging historical data with the GARCH model in a Chinese Miner Cohort

Coal workers’ pneumoconiosis (CWP) is one of the most common and severe occupational diseases worldwide. The main risk factor of CWP is exposure to respirable mine dust. Prediction theory was widely applied in the prediction of the epidemic. Here, it was used to identify the characteristics of CWP today and the incidence trends of CWP in the future. Eight thousand nine hundred twenty-eight coal workers from a state-owned coal mine were included during the observation period from 1963 to 2014. In observations, the dust concentration gradually decreased over time, and the incidence of tunnels and mine, transportation, and assistance workers showed an overall downward trend. We choose a better prediction model by comparing the prediction effect of the Auto Regression Integrate Moving Average model and Generalized Autoregressive Conditional Heteroskedasticity model. Compared with the Auto Regression Integrate Moving Average model, the Generalized Autoregressive Conditional Heteroskedasticity model has a better prediction effect. Furthermore, the status quo and future trend of coal miners’ CWP are still at a high level.


Introduction
Coal workers' pneumoconiosis (CWP) is one of the most common and severe occupational diseases caused by prolonged inhalation of coal dust containing a concentration of free crystalline silica. [1,2]CWP is still a global occupational health problem, especially in developing countries. [1,3,4]Inhalation of coal mine dust induces chronic inflammation and progressive fibrosis, eventually leading to respiratory failure and death. [5]nfortunately, it is an irreversible progression, and no effective treatment exists. [6][9] It accounted for 85% to 90% of all reported occupational diseases annually in China during the last decade. [10]Searching for the epidemiological characteristics of CWP in the past and present will help us to predict the future morbidity trend.13] In addition, disease prediction is a crucial step in transforming from passive prevention to active prevention, promoting the overall levels of prevention and control of occupational diseases and other public health issues. [11]ere, a historical cohort study was performed to explore the epidemiological situation of CWP.All the workers exposed to coal mine dust are from 1 state-owned mine in eastern China with a mine history of 60 years, which ensures that the subjects are homogeneous.Our primary purpose is to predict the PS and BW contributed equally to this work.

This work was supported by The Scientific Research Project of Jiangsu Health Committee (no. M2022083), Natural Science Foundation of Jiangsu Province (no. BK20230742) and the Open Project of Key Laboratory of Environmental Medicine Engineering of the Ministry of Education (no. 2022EME001), Jiangsu Provincial Medical Key Discipline (Laboratory) (no. ZDXK202249), and Medical research key project of Jiangsu Provincial Commission (no. K2019026).
Consent was given by all contributing authors.

The authors have no conflicts of interest to disclose.
The datasets generated during and/or analyzed during the current study are not publicly available, but are available from the corresponding author on reasonable request.
In abiding with the ethical requirements, the study conformed to the Declaration of Helsinki, and was nominated to be exempted from institutional ethical review by the Research Ethics Board of Jiangsu Provincial CDC.Official permission was taken from each respondent for this study and informed consent was obtained from all participants.future incidence rates of CWP and provide a piece of reliable evidence for the government to develop more effective occupational health strategies.

Patient and public involvement
Patients and/or the public were not involved in this research's design, conduct, reporting, or dissemination plans.

Occupational category
There are some job classifications in the underground mine.
It was difficult to separate the operation workers because of the irregular rotation of these 2 jobs among underground coal workers in this mine.Therefore, the miners are sorted into 2 groups according to the mine dust exposure levels and job titles in this study: operation group and materials handling the group.

Dust exposure data
Dust sample was captured twice a month randomly from each monitoring point.The gravimetric and pyrophosphate methods measured the dust concentration and free silica content of each working area.The cumulative dust exposure (CDE) was calculated by multiplying the duration in years by the dust concentration for every coal worker.It is an important index that can be available for each subject In the above formula, n is the job type number of each worker undergoing the observation period; Cj is the geometric mean of 8h-TWA (time weighed average concentration) of dust yearly; Tj is the observed years.CDE is given in mg•years.

Model introduction 2.6.1. Auto Regression Integrate Moving Average model. The Auto Regression Integrate Moving Average
(ARIMA) model is also called the differential integrated moving average autoregressive model, which is a time series forecasting and analysis method.The model treats the data formed by the research object as a random sequence and excludes individual outliers caused by confounding factors.The entire data set is a random variable dependent on the time factor.This dependence indicates the future development trend of the object.After describing this correlation, the future figure of the time series can be predicted based on this model.

2.6.2.
Generalized Autoregressive Conditional Heteroskedasticity model.The full name of the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model is the "autoregressive conditional heteroskedasticity model."It plays a critical role in developing time series variables in econometrics and can fully extract the residual information in the data.The GARCH model belongs to the extension of ARCH, which not only has the advantages of the ARCH model but also can take into account the flat period and the fluctuation period of the time series.Moreover, lower-order GARCH models can represent higher-order ARCH models, making model identification and estimation easier.

Statistical analysis
All data were analyzed using Excel 2007 and SPSS 22.0.P < .05 was considered to be statistically significant.

Baseline characteristics
A total of 8928 coal workers were included in the study.Among them, there were 495 patients (5.54%) with CWP and 8433 coal workers (94.46%) without CWP.For the 495 CWP patients, their average age of onset of CWP was 50.20 ± 11.03 years, their average duration of dust exposure was 26.75 ± 8.51 years, and their average age of first dust exposure was 20.75 ± 4.22years.For the 8433 coal workers without CWP, their average age was 43.82 ± 12.48 years, their average duration of dust exposure was 24.12 ± 10.48 years, and their average age of first dust exposure was 20.07 ± 4.34years.The average dust concentrations in different workplaces decreased with time from Table 1.Among patients with CWP, more than 90% were operation workers, which was significantly higher than that of coal workers without CWP from Table 2.One hundred sixty-seven CWP patients' (74.89%) and 246 coal workers' (4.78%) CDE were more than 1000 mg•years.Similarly, 56 CWP patients' (25.11%) and 4898 coal workers' (95.22%)CDE were between 100 and 1000 mg•years.
According to Table 3, we found that there was no statistically significant difference in age and dust exposure years between the materials handling group and the operation group.The number of cases in males was significantly higher than that in females in both groups.The accumulated dust exposure of the materials handling group was significantly higher than that of the operation group.

Incidence rate of CWP
In the observations from 1963 to 2014, the annual incidence of mine workers is shown in Figure 1.From 1963 to 1995, the incidence of CWP in the entire mine was on the rise overall, reaching a maximum of 6.43% in 1995.Since then, the incidence of CWP has been at a low level except in 2010, reaching 0 in 2013 and 2014.Figure 2 shows the annual incidence of CWP among tunnel workers, mine workers, transport workers, and help workers.As can be seen from the Table 2, the tunnel and mine workers group had a much higher incidence rate than the transport and helper workers group, with the most considerable difference in CWP incidence between the 2 at 1.16% in 1995.
The CDE of all miners is more than 100 mg•year, and with the change in the observation year, the incidence of CWP of the two is gradually at a lower level from Figure 3. Taking the incidence rate of workers from the mines from 1963 to 2014 as the dependent variable, the ARIMA model is built in the time series module of spss22.0software.According to the contents shown in Figure 5 (residual ACF and residual PACF diagram), P = and q = 0 can be obtained.Finally, we determine the annual incidence rate model of the pneumoconiosis in the mine as ARIMA (1, 0, 0) and draw the autocorrelation function (ACF) and partial autocorrelation function (PACF) chart of the residual sequence.The Box-Ljung Q residual statistics are meaningless (P = .939),indicating that the residual sequence is white noise and fitted R 2 = 0.404, so we can get that using ARIMA (1, 0, 0), the model predicts the future incidence rate of the mine is reasonable.

3.3.2.
Generalized autoregressive conditional heteroskedasticity.Three kinds of result volatility graphs were obtained by taking the same population as the research object and building the model with the same dependent variable, as shown in Figure 6.

Evaluation of model prediction effect
The model prediction effect evaluation indicators selected this time include Akaike information criterion and R 2 , of which R 2 can better reflect the fit between the prediction and the actual value.From Table 4, we found that the Akaike information criterion value of the GARCH model is stable, and the R 2 is relatively large, where R 2 (total population) = 0.532.

Discussion
Coal remains the primary energy resource in China, and coal dust exposure is one of the essential hazards during coal mine.Although the dust concentration and incidence of  pneumoconiosis decreased markedly in the past decades, dust exposure in the industrial environment has not been fully controlled. [14,15]According to a cost-effectiveness study in China, the economic loss ratio of investment to recovery is 1:1.43. [16]WP is also one of the occupational diseases that has been most studied, but there is no effective treatment on the market.Disease prediction is a critical step in transforming from passive prevention to active prevention and can help establish a complete prevention strategy.While, as a kind of severe pulmonary disease, the incidence of CWP can be affected by many factors, making it difficult to predict the trends accurately. [17,18]any prediction methods estimate and predict the incidence of pneumoconiosis and related diseases.Tan et al showed that a Grey GM (1, 1) model met the requirements of model predictions and can be applied to estimate new cases in the next three years. [19,20]A logistic regression model was used to predict the prevalence of pleural plaques in the workers exposed to asbestos in France, and a 0.8% to 2.4% yearly increase was reported for a mean exposure of 1 f/mL. [21]Tse et al established a unique score system with an excellent internal validity to assess the risk for silicosis, identify high-risk workers, and provide scientific guidance for clinical decision-making.Therefore, developing the corresponding prediction model by the epidemiological characteristics of different exposure populations should be essential to disease prevention.Notably, CWP is impacted by many risk factors other than the dust particle inhalation's quality, character, and response time.
The ARIMA model can incorporate the mixed effects of many complex factors into time variables and comprehensively consider factors such as periodicity, long-term trends, and random fluctuations.This is a prominent advantage of time series analysis in disease prediction. [22]The previous study performed a fitting analysis on various models and predicted the incidence of syphilis in Gansu Province based on each model. [23]The   GARCH model is a generalization of the ARCH model proposed by Bollerslev and can be used to deal with data with an extensive fluctuation range. [24]The emergence and expansion of the model provide important means for both probability and prediction.
This article uses the residual ACF and residual PACF diagrams in the time series model in the SPSS software to determine the P and q values of the model.Then builds a GARECH model and compares the prediction effect with ARIMA.By comparing the relevant predictive standard indicators, we found that the GARCH model is more suitable for the population of this study.Based on the above forecast results, we believe that the occupational hazards caused by dust are still worthy of our attention.Each coal worker should also receive regular physical examination according to the "Guideline of Occupational Health Surveillance (GBZ188-2014)."Besides, applying more advanced and automated coal mine equipment is the best way to reduce the concentration of mine dust.
There are also some shortcomings in this article, especially when analyzing without considering different covariates, except for CDE, such as socioeconomic status, health status, etc.In future research, we will gradually improve.

Conclusion
This study analyzed epidemiological data from 1963 to 2014 on nearly 9000 Chinese coal miners to forecast trends in CWP incidence.GARCH and ARIMA time series models were used.GARCH more precisely modeled historical volatility in CWP rates over time versus ARIMA.While overall incidence declined with dust reductions, risk levels remained elevated.Both models struggled to accurately predict short-term trends, highlighting the importance of sustained occupational surveillance coupled with improved exposure tracking to better protect workers against preventable respiratory disease.

3. 3 .
Model establishment 3.3.1.ARIMA model.It can be seen from Figure 4 that the annual number of patients in the mine is relatively stable without seasonal change.Autocorrelation coefficient function (ACF) diagram and partial correlation coefficient function (PACF) diagram are established.

Figure 3 .
Figure 3. Relationship between different cumulative dust exposure and the incidence of CWP.CWP = coal worker's pneumoconiosis.

Table 1
The average concentrations of dust in different workplaces (mg/ m 3 ).

Table 2
Occupational category and cumulative dust exposure of coal workers.

Table 3
Basic characteristics of the research object.
Figure 1.Incidence rate of coal worker pneumoconiosis.